Solar Physics 
DOI: 10.1007/' 



A Multi- Wavelength Analysis of Active Regions and 
Sunspots by Comparison of Automatic Detection 
Algorithms 

C. Verbeeck 1 • P.A. Higgins 2 • T. Colak 3 • 
F.T. Watson 4 • V. Delouille 1 • 
B. Mampaey 1 • R. Qahwaji 3 

, © Springer •••• 

^ '■ Abstract Since the Solar Dynamics Observatory (SDO) began recording ~ 1 TB 

, of data per day, there has been an increased need to automatically extract 

■ features and events for further analysis. Here we compare the overall detection 

performance, correlations between extracted properties, and usability for feature 
i - ""!' tracking of four solar feature-detection algorithms: the Solar Monitor Active Re- 

Ph ! gion Tracker (SMART) detects active regions in line-of-sight magnetograms; the 

^\ ' Automated Solar Activity Prediction code (ASAP) detects sunspots and pores in 

, white-light continuum images; the Sunspot Tracking And Recognition Algorithm 

Q-r (STARA) detects sunspots in white-light continuum images; the Spatial Possi- 

q , bilistic Clustering Algorithm (SPoCA) automatically segments solar EUV images 

5h ' into active regions (AR), coronal holes (CH) and quiet Sun (QS). One month of 

data from the SOHO/MDI and SOHO/EIT instruments during 12 May -23 June 
i ^ i 1 2003 is analysed. The overall detection performance of each algorithm is bench- 

marked against National Oceanic and Atmospheric Administration (NOAA) and 
^ — I 1 Solar Influences Data Analysis Centre (SIDC) catalogues using various feature 

properties such as total sunspot area, which shows good agreement, and the 
number of features detected, which shows poor agreement. Principal Component 
Analysis indicates a clear distinction between photospheric properties, which are 
highly correlated to the first component and account for 52.86% of variability in 
, the data set, and coronal properties, which are moderately correlated to both the 

f^*) ' first and second principal components. Finally, case studies of NOAA 10377 and 

t-H , 10365 are conducted to determine algorithm stability for tracking the evolution 

of individual features. We find that magnetic flux and total sunspot area are 
J> , the best indicators of active-region emergence. Additionally, for NOAA 10365, 

it is shown that the onset of flaring occurs during both periods of magnetic-flux 
emergence and complexity development. 
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1. Introduction 



In the 1960s, NASA launched the Pioneer 6, 7, 8, and 9 spacecraft, that were 
tasked with observing the solar wind and interplanetary magnetic fields, forming 
the first space-based space- weather network and recording 512 bits per second. 
By comparison, the recently launched Solar Dynamics Observatory (SDO) is 
currently relaying solar data back to Earth at a rate of 150 000000 bits per 
second. With SDO returning the equivalent of an image with 4096 by 4096 
pixels every second, human analysis of every image would require a large team 
of people working 24 hours a day. The technological advances that have allowed 
the increased flow of data, such as improving communication bandwidths and 
onboard processing power, allows us to record data with a much greater cadence 
and spatial resolution than ever before. However, there are problems with the 
storage, transfer, and analysis of such a large flow of data. SDO generates around 
1 TB of data per day which is unprecedented in solar physics. Getting this volume 
of data to researchers around the world, as well as storing it in convenient places 
for analysis, is essential to make good use of it. An effective solution to the 
problem is to use automated feature-detection methods, which allow users to 
selectively acquire interesting portions of the full data set. 

Development of automated solar feature detection and identification meth- 
ods has increased dramatically in recent years due to the growing volume of 
data available. An overview of the fun damental image-pro cessing techniques 
used in these algorithms is presented in Aschwanden ( 2010t ). These techniques 
are used to detect many features i n various types of observ ations at differ- 



ent heights in the solar atmosphere (jPerez-Suarez et all , 120111) . In thie present 



work, we focus on detecting sunspot groups and active regions in photospheric 
continuum images, magnetograms, and EUV images. Pr eviously, detec t ion o f 
suns pot groups in photospheric images w as investigated bv lZharkov et all ( 20041 ) 
and Curto. Blanca. and Martinez ( 20081). As well as detecting s unspot groups, 
Nguyen. Nguyen, and Nguyen ( 2005 ); Colak and Qahwa"u1 ( 2008 ) also make au- 
tomated classifications. The d e tection of active regions in magnetograms is ex- 

plored in McAteer et a/.l(E005 ): LaBonte. Georgoulis. and Rust ( 2007al ). Lefebvre and Rozelot^ 
(|2004l ) and lQahwaii and Colakl (|2006l) . while iDudok de Witl (|2006l) Introduces a 
supervised segmentation of EUV images into AR, CH, and QS regions. 

The purpose of this article is to determine the robustness of four algorithms 
for detecting and physically characterising active regions and sunspot groups 
by comparison of their outputs. We determine overall detection performance, 
the correlations between extracted feature properties using Principal Compo- 
nent Analysis, and the usability of these algorithms for tracking feature evo- 
lution over time. The to ols that we conside r are the Solar Monitor Active 
Region Tracker (SMART: iHiggins et al. , 20 111) which detects magnetic features 
using magnetograms, the Automated Solar Activity Prediction code (ASAP: 
Colak and Qahwaiil [20091 ) which detects sunspots and pores using photospheric 
intensity images, the Sunspot Tracking And Recognition Algorithm (STARA: 
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Watson et all . 120091) which also detects sunspots in photosphcric intensity im 
ages, and the Spatial Possibilistic Clustering Algorithm (SPoCA: iBarra et al. 



20091) which detects active regions in the corona using extreme ultraviolet images. 
More detail on how these algorithms operate is provided in Section [3] 

The overall performance of the algorithms is compared by determining the 
total number of features detected as well as their full-disk area. Our meth- 
ods are benchmarked against National Oceanic and Atmospheric Administra- 
tion (NOAA) and Solar Influences Data Analysis Centre (SIDC) catalogues. 
Few studies include comparisons of detection methods or different data types. 
Benkhalil et all (|2006l ) compare the detection of active regions in several data 
types using a single region-growing method. Other studies compare the detection 
of features in magnetograms using a variety of region-growing and morphological 
methods ( DeForest et al. . 2007 : Parnell et al. . 20091 ). Direct comparison of algo- 
rithms is important for their characterisation, as each is designed in a different 
way to detect features for a specific purpose. 

Correlations between the properties determined by the algorithms are in- 
vestigated using Principal Component Analysis ( Jolliffel . 12002 ). PCA has been 
used previously for various purposes in solar-physics and space-weather litera- 
ture, e.g. to detect outliers (jSarro and Berihuetd . [201lh . to reduce dimension- 



ality (jDudok de Wit and AuchereL 120071), or for exploratory data a nalysis of 
space- weather data sets ( Habash Krause. Franz, and Stevenson! . l201lh . 

Finally, the stability of the algorithms is tested for tracking feature evolu- 
tion through time. The evolution of two ARs is studied in detail, including 
their emergence in several layers of the solar atmosphere. iLites et al\ (119951 ) 



present a similar multi-layered analysis of the emergence of an AR. As non- 
potentiality increases in an AR, it may begin to exhibit enhanced coronal activ- 
ity. This effect has been studied in many articles, and it is related to dynamic 
behaviours such as helicity injection ( Morita and Mclntoshl . 120051) . turbulent 
cascades (jrlewett et aTl.l2008t[Conlon et adl2008ll2010l) . enhanced polarity sepa- 
ration line gr adient (Falconer. Moore, and Gary , 2008 ), a nd cha nges in magnetic 



connectivity ( Georgoulis and Rust . 2007 ; Ahmed et al. . 2010l ). In this article 



we study multiple behaviours in the same AR using magnetic property de- 
terminations. Finally, the decay of the AR in the corona and photosphere is 
compared. To our knowledge, this is the first time that automated feature- 
detection algorithms have been used to study temporal evolution using properties 
of magnetic non-potentiality, sunspot characteristics, and coronal activity of ARs 
simultaneously. 

The following sections detail these investigations. Observations used in this 
study are described in Section [5J and the four algorithms to be compared are 
introduced in Section [3] Our results are presented in Section 21 including an 
evaluation of the algorithms' overall performance, a correlation study of the 
complete sample of active regions and a detailed case study of two different 
active regions. Finally, a discussion of the results and concluding remarks is 
presented in Section [3] 
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2. Observations 

In this study we analyse data from the interval 12 May -23 June 2003. The 
detections obtained from each algorithm for the entire data set are studied as a 
whole in Section PI and NOAA ARs 10377 and 10365 are individually studied 
in detail in Section |4~B"1 This particular data set was selected for the diversity of 
solar features present. SOHO/MDI magnetograms are used for magnetic region 
detection by SMART, while SOHO/MDI continuum images are used for sunspot 
detection by ASAP and STARA, and SOHO/EIT images are employed for active 
region detection by SPoCA. These algorithms are described in Section [3] 

The MDI instrument on SOHO provides almost continuous observations of 
the Sun in the white-light continuum, in the vicinity of the Nil 676.78 nm 
photosphcric absorption line. These photospheric intensity images are primar- 
ily used for sunspot observations. MDI data are available in several processed 
"levels". We u s ed le vel- 2 images, which are smoothed, filtered, and rotated 
( Scherrer et "all. ll995h . SOHO provides two to four MDI photospheric intensity 



images per day with continuous coverage since 1995. 

Using the same instrument level, 1.8 line-of-sight (LOS) MDI magnetograms 
are recorded with a nominal cadence of 96 minutes. The magnetograms show the 
magnetic fields of the solar photosphere, with negative (represented as black) and 
positive (as white) areas indicating opposite LOS magnetic-field ori e ntatio ns. 

The Extreme Ultraviolet Telescope (EIT: iDelaboudiniere et all . Il995l) . on- 
board SOHO, delivers synoptic observations consisting of 1024 by 1024 images 
of the solar corona recorded in four different wavelengths every six hours. Every 
SPoCA segmentation in this article was based on a pair of 17.1 and 19.5 nm 
EIT images. All images used have been preprocessed using the standard eit- 
prep procedure of the SolarSoftware library. A fixed-centre segmentation with 
six classes was performed on the logarithms of the image pixel values. The AR 
centre values are 401.74 and 324.25 DNs -1 in 17.1 and 19.5 nm, respectively. 
These values were derived from a cumulative run of SPoCA on a data set of 



month ly EIT image pairs from February 1997 until April 2005; see iBarra et al. 
(|2009h 



In the case studies presented in Section l4~3l we compare observations of NOAA 
10365 and 10377 with flares characterised by the Reuven Ramaty High Energy 
Solar Spectroscopic Imager (RHESSI) team and distributed in the RHESSI flare 
list. 



3. Methods 

SMART, ASAP, and STARA detect photospheric features such as active regions 
and sunspots while SPoCA uses images in the extreme ultraviolet to observe 
active regions at the coronal level. In this section, we will describe each of the 
feature-detection algorithms used (Sections 13.11 13.21 13.31 and I3.4[) . The outputs 
from each of the algorithms arc associated using the method explained in Section 
1531 
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Figure 1. An example set of SMART detections from 10 June 2003. PL, UD, and UE identify 
three classes of unipolar feature, while AR denotes multipolar features. 



3.1. The SMART algorithm 



The Solar Monitor Active Region Tracker (SMART: iHiggins et all l201lh is an 



algorithm that uses magnetograms to automatically extract, characterise, and 
track active regions over multiple solar rotations - from first emergence to decay. 
This allows one to study the complete life-cycle of ARs. The algorithm uses a 
combination of image-processing techniques to determine the boundary of an 
AR. Two consecutive line-of-sight magnetograms arc smoothed using a gaussian 
kernel with a full-width at half-maximum of five pixels and thresholded by 70 G 
to identify potential features. The two detections are overlaid to identify and 
remove features that are not present in both magnetograms. The remaining 
detection boundaries are then dilated by ten pixels to create the final mask. 

Dilation is performed to include nearby decaying and plage fragments that 
may have separated from the main AR. This is intended to help conserve the 
measured polarity balance of the AR as it evolves. An example set of SMART 
detections is shown in Figure [1] In this article, SOHO/MDI LOS magnetograms 
are used for detection, but recently the algorithm has been adapted for use with 
SDO/HMI magnetograms (For near-realtime detections, see[httpT//solarmonitor.org/ smart_disk ).( 
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NOAA 10365 28-May-2003 Ising Energy Map 
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Figure 2. Left: NOAA 10365 highlighting the PSL (white contour), a linear fit to the locus 
of PSL positions (dotted red line), bipolar connection line (solid red line), and heliographic 
longitude and latitude reference lines (dotted black lines). Right: An Ising energy map of the 
same region. Red represents the magnitude of energy for each pixel from highest (light) to 
lowest (dark). Since the connection between pixels of opposite polarity is being represented, 
the energy map is only shown for one of the polarities. 



Several new physical-property modules have been added to SMART for this 
study. The tilt of an AR is obtained by measuring the angle between the line 
connecting the centroids of the largest flux-weighted positive and negative blobs 
(solid red line in left panel of Figure and the heliographic- latitude line passing 
through the centroid of the AR. The length of this bipolar connecting line [BCL] 
line is also determined and provides a measure of the relative compactness of 
an AR when compared to the evolution of AR area (left panel of Figure [2]). 
Additionally, the angle [a] detected between the best-fit line to the locus of 
pixels forming the main polarity separation line [PSL] and the BCL is measured. 
The main PSL is defined as the interface between the aforementioned largest 
flux-weighted positive and negative blobs. The temporal derivative of the angle 
a is shown to be a useful proxy for the occurrence of helicity injection in an 
AR (Morita and M cintosh. 2005), whic h may be an important flare predictor 
( LaBonte. Georgoulis. and Rustl . 2007b ). The evolution of this angle is studied 
in Section 14.31 These properties are less informative when studying AR com- 
plexes or non-bipolar ARs, since often no main axes can be discerned, making a 
description of the AR orientation impossible. 

The original Ising model is used for the analysis o f magnetic int e ractio ns and 
structures of ferromagnetic substances. Here, as by lAhmed et al. ( 2010i ). Ising 
energy is used as a proxy for magnetic connectivity and complexity within an AR 
(right panel of Figure 0) ■ We use a modified form of that given by lAhmed et al 

hold) . 



This version is calculated using 



E 



D, 



(1) 



where Bi (Bj ) are pixel values of positive (negative) linc-of-sight magnetic field 
and Dij is the spatial distance between pixels i and j. The Ising energy increases 
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Figure 3. An example set of ASAP detections from 10 June 2003. 



as the negative and positive magnetic footprints within an active region become 
more entangled. The evolution of this property is studied in Section [4.31 

The modules added to SMART for this work will be used for future large-scale 
active region studies and will be added to the pipe-line versions of SMART run- 



ning at the Heliophys ics Events Knowledgebase (http: / /www. I msal.com/hek/index. htm 1 1 



Hurlburt et al. . 201dh and include d in the Heliophysic s Integrated Observatory 



( [http://www.helio-vo.eu/index.php l lBentlev et aZ.I . I2009T ) . In the future, other prop-| 
erty modules w ill be added to calcula t e a ph ysically motivated magnetic-connectivity! 



(Hcwett et al 



measurement ( Georgoulis and Rusti . 120071) . multi-scale en ergy spectrum slope 



2008 ). and multi- fractal spectrum properties ( Conlon et al . 2008) .| 



3.2. The ASAP algorithm 

Automated Solar Activity Prediction (ASAP) is the collective name for a set of 
algorithms used to process solar image s. It is composed of a l gorith ms for sunspot, 
faculae, an d active-region detections (jColak and Qahwaiil 20081) and solar- flare 
prediction ( Colak and Qahwaiil |2009( ) . Unlike other algorithms described in this 
article, ASAP uses quick look (in GIF or JPEG format) images for its processes. 
In this article a recently developed sunspot detection algorithm for ASAP is 
used. This new sunspot- detection algorithm works with continuum images and 
is described in detail by IColak et all (| 2 1 lh . The main steps in this algorithm 
can be summarized as follows: 
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• Images are pre-processed to detect the solar disk, and to remove limb 
darkening. 

• Detected solar-disk data are converted from heliocentric coordinates to 
Carrington heliographic coordinates. 

The key point in heliographic conversion is to choose the size of the 
resulting image. If a very small image size is selected, this will cause 
truncation and loss of data. If a very large one is chosen, there will be 
many spaces in the resulting image. In this study, each heliographic 
degree is represented by ten pixels therefore the resulting heliographic 
images are 3600 by 1800 pixels. 

Initially, an empty 3600 by 1800 image is created. When the Car- 
rington longitude and latitude of all of the pixels on the solar disk 
are calculated, their pixel intensities are placed in corresponding lo- 
cations. 

The distribution of pixels representing degrees in heliocentric coordi- 
nates is not uniform due to the spherical shape of the Sun. Towards 
the limb of the Sun on a two dimensional heliocentric image each 
degree will be represented by fewer pixels although in a heliographic 
image each degree is represented with same amount of pixels. Size of 
the heliographic image is larger than heliocentric image and therefore 
there will be gaps (empty pixels that are not filled with data from 
heliocentric image) in the resulting initial heliographic image after 
conversion. 

The following algorithm is applied to estimate the pixel intensities of 
the gaps in the heliographic image. Every pixel on the heliographic 
image is examined and when a pixel without any information is found, 
its neighbouring pixels are searched using variable-size windows. First 
a 3 3 pixel window is centred on the empty pixel and the values of the 
non-empty neighbouring pixels within this window are added. If all 
the neighbouring pixels within the initial window were empty, the size 
of the window is increased by one and the process continued until at 
least one pixel with information is available within the window. Then 
the average value of the valid pixels found within the final window is 
assigned to the empty pixel. The algorithm continues until all of the 
pixels have been processed. 

After all data gaps have been filled, a smoothing algorithm using a 
3x3 linear uniform filter is applied to create the final heliographic 
image. 

• Subsequently, the following filter is applied to the Carrington heliographic 
image for detecting sunspots: 

An intensity filtering threshold value T = fi — (a x a) is calculated 
where fx is the mean, a is the standard deviation of the image, and a 
is a constant equal to 2.5. 

The intensity of each pixel in the image is compared to this T value. 
If it is less than the calculated threshold value, the pixel under con- 
sideration is marked as a sunspot. 
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Figure 4. An example set of STARA detections from 10 June 2003. 



Although heliographic conversion can be computationally expensive, it yields 
detections that are more accurate compared to the ones done on heliocentric 
coordinates. A tracking algorithm was added to ASAP for this study, finding the 
intersections between objects (e.g. sunspots) on two consecutive heliographic 
images. Since the differential rotation is very small in Carrington heliographic 
coordinates, there is no need for longitudinal corrections. ASAP tends to detect 
small sunspots, which can be classified as pores. This is useful especially when 
grouping and classifying sunspots. However, the number of tracked sunspots 
increases because most pores are only visible for a few hours on the solar disk. 
An example set of ASAP detections is shown in Figure [3] 

It was not necessary to update ASAP for this work, but some computational 
issues have been discovered. For instance, the application of ASAP to SDO/HMI 
images larger than 1024 x 1024 pixels shows that heliographic conversion algo- 
rithms must be made more efficient to tackle larger images. 

3.3. The STARA algorithm 

The Sunspot Tracking And Recognition Algorithm (STARA) code was written 
in 2008 in order to perform consiste nt long term observations of sunspots over 
Solar Cycle 23 ( Watson et ai . 20091 ). It was originally developed for use with 



SOLA: S0LA1525R2_Watson_20110901.tex; 5 September 2011; 1:18; p. 9 



C. Verbeeck, et al. 



MDI data but has since been extended for use with data from SDO as well as 
from a number of ground-based instruments. A simple detection method was 
required to speed up processing when large data sets were used and suitable 
techniques were found in the field of morphological image processing. 

The STARA detection method works as follows: The image is read in and 
inverted so that the sunspots appear as bright areas on a dark background 
for compatibility with the top-hat transform. Morphological erosion is applied, 
which removes peaks and works by treating the 2D image as a 3D surface with 
the pixel values indicating the height of the surface at that point. A probe known 
as a structuring element is then chosen (in this sphere with a radius of 

14 MDI pixels) and is "rolled" underneath the surface whilst always touching 
it. The centre of the sphere then maps out a new surface that is close to 14 
units below the original. However, any sharp peaks present (such as sunspots) 
would not allow the sphere to fit inside them and so are not represented in the 
eroded image. A morphological dilation is then performed which is identical to 
an erosion apart from the sphere being rolled on top of the surface. The dilated 
surface is subtracted from the original to leave only the sunspot peaks present. 

As the sphere rolls over the surface it also carries the limb-darkening profile 
through each step and when the final surface is subtracted from the original, the 
limb-darkening effects are automatically removed. 

More de tail on this step and the top-h at transform is given by IWatson et al 



(|2009h and iDouehertv and Lotufol (|2003h . A size filter is then applied that re- 



moves areas containing less than ten pixels as they are far more likely to be pores 
than sunspots. The remaining areas are recorded along with their locations as 
well as the number of umbral regions detected within the sunspot boundary. This 
is repeated for a number of consecutive i mages and using the solar-rotation model 
of lHoward. Harvey, and Forgach (|l990h the sunspots can be tracked throughout 



a sequential data set. This allows the evolution of individual sunspots to be 
followed as well as the overall properties of the sunspot population as a whole. 
An example of a typical set of STARA sunspot detections is given in Figure 01 

STARA has undergone very few changes over the course of this work as 
the code was well established beforehand. Nevertheless, some subtle problems 
have been discovered in the process. As sunspots approach the limb (at lon- 
gitudes greater than 75°) the sunspot position returned by the code quickly 
loses accuracy. This is a common problem with feature-detection methods as the 
geometrical foreshortening effects test the limits of automated systems. There 
are also potential problems present with bad data. Obviously the best remedy 
is to remove it altogether but with MDI it is possible to have images with only 
half of the solar disk present or with large artifacts. Both of these situations can 
have substantial effects on the detected global properties and cause problems 
with analysis. 

3.4. The SPoCA algorithm 

The Spatial Possibilistic Clustering Algorithm (SPoCA) is a multi-channel fuzzy 
clustering algo rithm that automa tically segments solar EUV images into a set 
of features; see Barra et ~al\ (l2009h for a complete presentation. It optimally sep- 



arates active regions, quiet Sun, and coronal holes, even though the boundaries 
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Figure 5. An example set of SPoCA detections from 10 June 2003. 



of these regions are not always well defined. The description of the segmentation 
process in terms of fuzzy logic was motivated by the facts that information 
provided by a solar EUV image is noisy (corruption by Poisson and readout 
noise as well as by cosmic-ray hits) and subject to both observational biases (line- 
of-sight integration of a transparent volume) and interpretation (the apparent 
boundary between regions is a matter of convention). 

SPoCA takes as input an image in one (or several) EUV passband(s) and 
uses as "feature vector" the pixel value (or the pixel value vector in the multi- 
channel case) in order to classify a pixel as belonging to one of three classes, 
namely AR, QS, and CH. SPo CA is based on a fuzzy clustering techni que called 
"Possibilistic C-Means" fPCM: iKristmapuram and KellerL Il993l Il996ft . For each 
class, it assigns a "probability" or membership value <G [0,1] to every feature 
vector. 

PCM is an iterative method that searches for three compact clusters in the 
space of feature vectors, corresponding to AR, QS and CH. In practice, this is 
achieved through a gradient-descent scheme that minimizes an objective function 
that is related to the total intracluster variance plus some penalty term. In every 
iteration, new membership values are calculated based on the class centre values. 
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The membership values are used in turn to compute the new class centres, and 
so on, until the class centres converge to within a preset accuracy. 

In order to cope successfully with intensity outlier pixels such as those affected 
by cosmic rays and proton storms, a spatial rcgularization term was added to 
the PCM objective function, forcing membership values in a neighbourhood to 
be as close as possible. By assigning each pixel to the class for which its feature 
vector has the largest membership value, the image is segmented. An example 
set of SPoCA detections is shown in Figure [5] 

Since the solar corona is optically thin, and since the intensity in EUV 
images is obtained through an integration along the line of sight, there is a 
limb-brightening effect in those images, which may hinder the segmentation pro- 
cess. Therefore, the EUV images are pre-processed so as to lower the enhanced 
brightness near the limb. The initial SPoCA class contours are automatically 
postproccsscd using a morphological opening with a circular isotropic clement 
of size unity. 

Since the publication of lBarra et al. (|2009h . the SPoCA algorithm was opti- 



mized and extended in several ways: 

• In order to gain more consistent results, we introduced some constraints on 
the penalty term of the objective function to be minimized. 

• The limb correction is now applied in a continuously increasing way towards 
the limb instead of introducing it abruptly from some point onwards. 

• For individual AR detection, first the Bright Points are removed (size 
threshold is 1500 square arcseconds) and then a spherical dilation (ra- 
dius: 12 EIT pixels) is employed to group the remaining bright blobs into 
individual active regions. 

• Individual AR are tracked through time by comparing the masks of regions 
in two consecutive time frames, taking into account differential rotation. 

SPoCA has been running in near-realtime on AIA data since Sept ember 2010 
as part of the SDO Feature Finding Project ([Martens et al. . 120111) . a suite of 



software pipeline modules for automated feature recognition and analysis for the 
imagery from SDO. The resulting A R events are automati cally ingested by the 
Heliophysics Events Knowledgebase ( Hurlburt et all l2010l) . 



SPoCA is the only algorithm presented here that detects ARs in the solar 
corona. The method is generic enough to allow the introduction of other channels 
or data. It has been applied to SOHO/EIT, SDO/ AIA, PROB A2 /SWAP, and 
STEREO/EUVI images, and could potentially be used on other multi-channel 
maps such as Differential Emission maps. In this article we focus on ARs, but 
QS and CH can also be detected and tracked. 

3.5. Association of Detected Features 

The S MART tracking module, called "Multiple Disk Passage" fMuDPie: lHiggins et all 



20111 ). is used to associate individual SMART detections of the same physical 
feature over time by comparing the centroids of all detections in consecutive 
magnetograms. Two detections are associated if their centroids match within 
5° heliographic latitude and longitude. The tracked SMART detections arc then 
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associated with the best matched detections in each of the other algorithms as 
described in the following paragraphs. 

In order to analyse the relation between the features detected by different 
algorithms, a routine developed in Python associates detections from each algo- 
rithm in two ways. First outputs from ASAP, STARA, and SPoCA are associated 
with SMART outputs based on time and location information. Second, individual 
association outputs (ASAP vs. SMART, STARA vs. SMART, SPoCA vs. SMART) 
from the first step are combined using SMART IDs and timing information. 

For associations, SMART is chosen as the base algorithm because SMART 
detections usually encircle the corresponding ASAP and STARA detections and 
they are also more stable over time than SPoCA detections, due to the frequent 
splitting and merging of coronal AR detected by SPoCA. Also, SMART detects 
magnetic regions from MDI images which are more frequently available than 
the continuum and EIT images that the other algorithms are working on. The 
association rules are described below. 

First Step: Individual associations (ASAP, STARA, SPoCA vs. SMART) 

• The time difference between the solar detections under consideration (i.e., 
sunspots from ASAP and STARA, active regions from SPoCA versus mag- 
netic regions from SMART) is calculated. 

• If the time difference between a magnetic region detected by SMART and 
a solar region detected by another algorithm is less than 0.2 Julian days 
and their hcliographic bounding boxes intersect, then these detections are 
associated. Since SPoCA does not deliver heliographic bounding boxes, a 
bounding box of 5° in longitude and latitude is assumed. 

• If the same solar detection is associated to more than one SMART region, 
only the closest (in terms of time and distance between centres) SMART 
region is selected as associated. 

• Associations are saved in separate files (three files; ASAP vs. SMART, 
STARA vs. SMART, SPoCA vs. SMART) including the selected characteris- 
tics from each algorithm. 

output that is going to be analysed. 

Second Step: Combining all of the associations 

• The SMART algorithm uses an ID for each magnetic region detected and 
in this second step, the association data saved in the three separate files 
from the first step are combined using this ID and time information. The 
association data with same SMART ID and closest timing are combined 
together. Timing information still has to be used due to the difference 
between the image times. 

• The final combined data are saved in one file. 

SMART provided 9356 detections (207 magnetic region features), ASAP 3039 
detections (952 sunspot features), STARA 1329 detections (433 sunspot features) 
and SPoCA 1222 detections (190 coronal active-region features) within the con- 
sidered time- frame. In the first step, 714 SMART detections were associated to 
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2889 ASAP detections, 550 SMART detections were associated to 1315 STARA 
detections and 1089 SMART detections were associated to 1117 SPoCA detec- 
tions. In the second step when all of these data were combined, 350 detection (33 
feature) associations were created for SMART, ASAP, STARA and SPoCA. The 
daily averages of some of the outputs such as average daily sunspot numbers, 
active region numbers and average areas are compared to the NOAA active 
region catalogue in Section |4~T1 In the considered period, NOAA recorded 217 
detections (37 features). 

In case of merging or splitting of neighbouring coronal regions as detected 
by SPoCA, the association procedure described above does not relate the new 
SPoCA detection to the corresponding SMART detection. This happened several 
times in the case studies in Section 14.31 For these cases, we applied a manual 
association of SPoCA detections to SMART detections. 



4. Results 



The feature detections from each algorithm are compared in the following sec- 
tions. First, in Section [4. II the overall detection performance of the algorithms is 
presented, and compared to the corresponding NOAA detections and the daily 
international sunspot number. Next, Principal Component Analysis is performed 
on the full set of detections to probe the overall structure of the physical prop- 
erties calculated by the algorithms in Section 14.21 Finally, in Section 14.31 the 
evolution and flare activity of NOAA 10377 and 10365 are analysed in depth, 
using physical properties determined by each algorithm. 



4.1. Algorithm Performance 



The performance of the algorithms is measured by comparing the daily total and 
average values of some of the solar feature properties to each other, to values 
reported by NOAA ( [http://www.swpc.noaa.gov/ftpdir/forecasts/SRS /README), 
and to the international daily sunspot numbers ()SIDC-Team . 2003) between 12 
May and 23 June 2003. 

A comparison of these data is provided in Figure |51 The graph on the up- 
per left side of Figure [5] compares the daily number of sunspots detected by 
ASAP and STARA to the total number of spots within NOAA regions and to 
the international sunspot number. Generally, peaks and valleys in all of the 
series follow each other but the international sunspot numbers and the sunspot 
numbers for NOAA are usually higher than the sunspot numbers for ASAP and 
STARA. When sunspots are detected manually, each umbra within a penumbra is 
counted as one sunspot, whereas the automated algorithms discussed here count 
each penumbra as one sunspot although it could have more than one umbra 
within. Therefore the difference in sunspot numbers increases when the number 
of complex sunspot regions increases. Also, the number of sunspots detected by 
ASAP is always higher than the ones detected by STARA. This is because ASAP 
tends to detect very small sunspots (sometimes pores) while STARA has a higher 
threshold for the size of sunspot candidates. 
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Figure 6. Comparison of average detection results of algorithms to reported NOAA and in- 
ternational sunspot numbers. Upper-left: Comparison of number of sunspots detected by ASAP 
and STARA and reported by NOAA and recorded international sunspot numbers. Upper-right: 
Number of regions detected by SMART and SPoCA compared with the ones reported by NOAA. 
Lower-left: Comparison of average daily sunspot areas detected by ASAP and STARA versus 
NOAA. The normalised international sunspot number is over-plotted for context. Lower-right: 
Comparison of average daily region areas detected by SMART and SPoCA. 



The graph on the upper right side of Figure |5] compares the daily number of 
regions detected by SMART and SPoCA to the daily number of NOAA regions. 
SMART and SPoCA detect more regions than NOAA because the NOAA number 
is given to a region only if it has one or more sunspots, while SMART and SPoCA 
regions do not depend on the existence of sunspots within detected boundaries. 
Because of the projection effects of large coronal loops, two close but distinct 
regions in the photosphere will often be detected by SPoCA as one region. This 
explains why SMART has a higher tally of daily regions than SPoCA. This effect 
is most visible near the solar limb. 

A comparison of the areas of ARs and sunspots as detected by the four algo- 
rithms and NOAA is presented in the lower part of FigurcEl SMART, ASAP, and 
STARA areas were corrected for the line-of-sight projection effect that decreases 
the observed area as the feature moves away from the central meridian. Since the 
line-of-sight projected area of coronal loops does not necessarily decrease with 
longitude, no systematic effect is expected for the observed SPoCA area, so the 
raw area is presented. Sunspot areas are given in millionths of solar hemisphere 
to be consistent with the units of the NOAA catalogue. 

The graph on the lower left side of Figure [6] compares the sunspot areas 
detected daily by ASAP and STARA to the NOAA sunspot areas, while the 
international sunspot number is added for context. These three time series agree 
well but there appears to be a one-day shift in NOAA sunspot areas. The ASAP 
and STARA sunspot measurements are averages of observations throughout the 
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whole day, whereas, depending on the day, the NOAA sunspot observation may 
be quite early in the day. Since sunspots emerge quickly and decay slowly, any 
sunspots that emerge late in a day are likely to be missed by NOAA, but 
registered by ASAP and STARA. The following day, the new sunspots are likely 
to remain visible, so ASAP and STARA are likely to show the same area as the 
previous day, while the NOAA Ml CM. will increase. 

The graph on the lower right side of Figure [6] shows the comparison between 
active region areas detected daily by SMART and by SPoCA. Considering we are 
dealing here with photospheric versus coronal areas, a good agreement between 
the areas is obtained. Both SMART and SPoCA areas vary smoothly. Moreover, 
they are large enough to include the whole sunspot group if present, and to 
measure changes in topology or complexity consistently. In summary, the time 
series of sunspot and AR areas are well correlated, showing similar behaviour 
in time, and the differences observed are likely due to the data and detection 
methods used. 



4.2. Principal Component Analysis 



Principal Component Analysis (PCA: ljolliffe . 2002 ) aims at reducing the dimen 



sionality of a problem. It does so by maximizing the data structure information 
in the principal component space. More precisely for a data set containing 
n observations of p variables, the principal components are the directions in 
n-dimensional variable space in which the data set exhibits maximal variance. 
In this article, PCA (based on linear values) is used to get some insi ght in the 



correlation structure of the following p variables: the Schrijvcr R value (jSchriiver 



20071 ). length of the strong gradient line, magnetic flux, maximum B field, area, 
length of the bipolc connecting line, Ising energy, and Ising energy per pixel (Ising 
E ppx) as computed by the SMART algorithm, the sunspot area and number of 
sunspots as given by ASAP, and the raw AR area, maximum, variance, kurtosis 
and skewness of the EUV intensity as computed by SPoCA. 

These variables were computed on data recorded 12 May -23 June 2003, at 
a cadence of 96 minutes for photospheric features, and of six hours for coronal 
features. Data from the various algorithms were then associated as described in 
Section 13.51 

We excluded data points corresponding to regions whose centre was more 
than 60° from the central meridian, as projection errors involved become too 
large. Table Q] lists the percentage and cumulative percentage of the variance 
explained by the principal components. The first two components explain 67% 
of the total variability in the data set. 

Figure [7] represents the variables in the plane of the first two components. 
Each variable lies within a circle of radius one in this figure. Variables that 
lie close to the circle are well represented by the first two components, while 
variables close to the origin are not. The cosine of the angle formed by the origin 
and two points on the graph of Figure [7] gives the correlation between the two 
corresponding variables. This figure thus yields a graphical representation of the 
correlation structure between variables. 
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Table 1. Percentage and cumulative percentage of the 
variance of the 15-dimcnsional variable space described 
above, that can be explained by the consecutive prin- 
cipal components. Note that the first two principal 
components comprise 67% of the variance. 

Component % variance Cumulative % variance 



1 


52.86 


52.86 


2 


14.61 


67.47 


3 


10.30 


77.77 


1 


6.55 


84.32 


5 


5.17 


89.49 


6 


3.61 


93.11 


7 


2.82 


95.93 


8 


1.61 


97.54 


9 


0.93 


98.47 


10 


0.44 


98.92 


11 


0.39 


99.30 


12 


0.29 


99.59 


13 


0.22 


99.81 


11 


0.12 


99.93 


15 


0.07 


100.00 



The first observation is that no variable is strongly anti-correlated with an- 
other in this data set. Roughly speaking, all variables evolve in the same direc- 
tion. A more detailed inspection shows that the Schrijver R value, the length 
of the strong gradient line [Lsg], and the Ising energy per pixel are strongly 
(positively) correlated to each other, as are the Ising Energy and the ASAP 
area. To a lesser extent, these five variables are all correlated to each other, as 
well as to the magnetic flux. This related behaviour of PSL length, R value, Ising 
energy, and ASAP area is apparent in Figures [9j [TTJ Q21 [15J [16] and HH These 
variables are linked to a measure of co mplexity of the AR and its capability to 
produce a flare, see Colak et al. ( 20ld) . 



The maximum and skewness of the EUV intensity are strongly correlated: 
indeed a high value of maximum EUV intensity implies a long tail in the intensity 
distribution function, hence a high skewness. Similarly, the variance and kurtosis 
of the EUV intensity are strongly correlated, which is expected since they are 
related by definition. Note that variance and kurtosis lie further from the circle 
than maximum and skewness, and hence are less well represented by the first two 
principal components. Finally, the maximum magnetic field, length of the Bipole 
Connecting Line, SPoCA area, SMART area, and ASAP number of sunspots are 
not well represented by the two first components, and hence their correlation 
structure cannot be interpreted from this plot. 

PCA also tends to separate photospheric and coronal contribution. Features 
computed at the photospheric level such as R, Lsg, Ising energy, ASAP area, and 
flux have a large contribution to the first component, which accounts for 52.86% 
of the variability in the total data set. The maximum, variance, skewness, and 
kurtosis in EUV intensity images are moderately correlated to both the first and 
second principal components. 
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Correlation between variables and Principal Components 1 and 2 




Principal Component 1 

Figure 7. The projections of the algorithm variables upon the first and second principal 
components are plotted. They provide a measure of the extent to which these variables are 
correlated with the first and second principal components. 



This study shows that a reduction in dimensionality using PCA can be per- 
formed without losing too much information. Such reduction can enhanc e the 
accuracy and robustness of a subsequent classification scheme (IJiand . 120111) that 
would aim for example at separating active regions that are prone to flares from 
quiet active regions. 

4.3. Case Studies 



In the following section we analyse the time evolution of the ARs that emerge as 
NOAA 10377 (a simple region) and 10365 (a complex, flaring region). Of special 
interest is how activity in the corona results from changes in the photosphere. 
Drawing this connection is essential for flare prediction, since the photosphere is 
more easily physically characterised than the corona, where flares actually occur. 
The photosphere-coron a conn ection is not well understo od, e.g. the work of 
Leka and Barnes! ( 20071 ) and of Handv and Schriiver ( 2001 ) , with the references 
therein. 

We compare observations of NOAA 10365 and 10377 with flares characterised 
by the Reuven Ramaty High Energy Solar Spectroscopic Imager (RHESSI) team 



and distributed in the RHESSI flare list( http:/ /sprg. ssl.berkeley.edu/~jimm/hessi/hsi_f lare_list.htm I | 

The flares, which have been associated with the individual ARs by the RHESSI 
team, are represented in plots (Scction l4.3|) as downward pointing arrows, whose 
size is logarithmically proportional to their peak count rate. 

4.3.1. NOAA 10377 

NOAA 10377 first emerges just before rotating onto the visible disk on 4 June 
2003. It continues to gradually develop as it progresses across the disk producing 
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Figure 8. A comparison of NOAA AR 10377 detections. ASAP sunspots arc represented by 
black crosses. The contours represent SMART in black (with NOAA 10377 outlined in red) for 
the magnetic features, SPoCA in dashed blue for coronal features, and STARA in orange for 
sunspot pcnumbrac and magenta for umbrae. 



very little activity (only one B9.1 event is listed in the NOAA events catalogue 
(http://www.swpc.noaa.gov/ftpdir/indices/events/README)). Some of the flares 
produced by 10377 may have been missed due to the presence of 10375, which 
produced many large flares, swamping any signal that could be attributed to 
10377. 

Figure \E\ shows the SMART detection of 10377 in red, while other features are 
outlined in black. The extended dashed blue contours are SPoCA AR detections 
and the small symbols and contours arc sunspot detections from ASAP and 
STARA, respectively. It is clear from Figure [3] that positions of the SMART, 
ASAP, STARA, and SPoCA detections agree quite well. Whereas the sunspots 
detected by ASAP and STARA are well confined within the SMART magnetic 
region boundary, the SPoCA region most often contains most of the SMART 
detection. In the case of coronal loop structures forming between nearby ARs, 
adjacent SPoCA detections will merge and the SMART and SPoCA ccntroids will 
diverge. This is especially apparent near the solar limb, where coronal structures 
extending above the solar surface will be superimposed. 

Figures 151 -[Til show the evolution of NOAA 10377 as it progresses across 
the disk. In the top panel of Figure [5] the Stonyhurst longitudes of the re- 
gion centroids from each algorithm are shown. The vertical dotted lines indi- 
cate where the AR magnetic bounding box edges (dashed-dotted) and centroid 
(dashed) cross —60 and 60° longitude. The cosine correction used to correct for 
line-of-sight effects on magnetic-field properties is not sufficient outside of this 
range. Also, beyond 60°, sunspot visibility is below w ^ of that at disk centre 
( Watson et all. 120091) due to the Wilson depression. 

The top panel of Figure [9] tracks the longitude of ccntroids over time. We 
see that ASAP and STARA curves are above the SMART curve on this plot, 
suggesting that the centroid of the magnetic footpoints (SMART) follows behind 
the sunspot centroids (ASAP and STARA). Since the longitudinal speed of the 
white-light and magnetic detections are the same, this implies that the following 
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Figure 9. Time series of position, area, and sunspot information characterising the evolution 
of NOAA AR 10377. The legend indicates symbols and colors for each of the detection algo- 
rithms. The axes of the area plot are split between left (SPoCA and SMART) and right (ASAP 
and STARA). The SPoCA areas have been divided by three for display. 

polarity of the AR extends beyond the embedded sunspots, while the leading 
polarity remains compact. As the NOAA region 10377 is close to 10375, this 
last region affects the SPoCA detections. From 3-6 June, SPoCA detects both 
NOAA regions within a single boundary. When this region splits into two parts 
on 6 June, the SPoCA longitude and area curves decrease abruptly, and can now 
be directly compared to the photospheric structures. This changes when the two 
NOAA regions merge again on 11 June. Whenever the region detected by SPoCA 
corresponds to the region detected by the other algorithms, all four longitudes 
agree well. 

The total sunspot area determined by ASAP and STARA (Figure [5J middle) 
is very similar except for one data point near 12 June 2003. This is due to the 
MDI image on 11 June 2003 at 1736 UT being distorted. Most of the distortion 
is visible on the south limb of the image where this area is darker than the 
rest of the solar disk. Because ASAP detects the solar disk directly from the 
image, while STARA uses FITS keywords, the determination of the solar disk 
by these two methods is different. This explains why on this image the ASAP 
sunspot area is much smaller than the STARA area: whereas the distorted area is 
detected by STARA as a large sunspot, it is completely discarded by ASAP. The 
SMART and SPoCA areas of photospheric magnetic regions and coronal active 
regions obey the same general trend as the sunspot areas, although the absolute 
scales arc different. While the area measurements are stable, the total number of 
sunspots is not. The total area is dominated by the largest sunspots, while the 
total number of spots is affected by small transients which ASAP is especially 
sensitive to. 
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Figure 10. Time series of (top) total magnetic flux, total EUV intensity, (bottom) maximum 
magnetic field, and maximum EUV intensity for NOAA AR 10377. The axes of the plots are 
split between left (magnetic-field properties, black crosses) and right (coronal properties, blue 
squares). RHESSI flares associated with the AR are indicated by downward arrows. 

In the top panel of Figure QUI the emergence of the magnetic structure of 
10377 is clearly seen in measurements of its total flux. The AR is stable until 
«8 June 2003 when a phase of rapid emergence begins, lasting until w 11 June 
when the total magnetic flux has more than doubled. Comparing Figure [9] and 
Figure [TUJ we see that the total magnetic flux increases faster than the magnetic 
area, implying that the AR magnetic fields emerge relatively faster than they 
diffuse. The same general smooth trend is observed in the SPoCA total EUV 
intensity between 6 and 11 June. After NOAA 10377 merges with 10375 on 11 
June, we see a clear decay of the total EUV intensity in this combined region. 
Note that both SMART flux and SPoCA total EUV intensity behave similarly 
to the region area time series. 

In the bottom panel the maximum magnetic field is much less stable than the 
flux, and shows no clear trend. The maximum SPoCA EUV intensity does not 
change significantly between 6 and 11 June for NOAA region 10377, but exhibits 
three clear peaks afterwards which can be attributed to region 10375. The first 
peak, on 11 June, can be attributed to SPoCA merging with NOAA 10375. The 
peak on 12 June, near 1300, is probably associated with the M1.0 flare in 10375 
at around 1358 UT, whereas the 13 June 0700 UT peak is probably related to 
the M1.8 (0628 UT) or C6.1 (0710 UT) flares in 10375. These flares (appearing 
in the NOAA events catalogue) arc not indicated by the RHESSI arrows, since 
we have only displayed those flares attributed to 10377. This shows that SPoCA 
maximum intensity is capable of indicating solar eruptions. 

Magnetic properties related to polarity mixing and complexity are shown in 
Figure 111! In the top panel, the angle between the bipole connection line and 
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Figure 11. Time series of (top) PSL orientation with respect to the bipole separation line, 
(middle-top) bipole separation line length (crosses) and PSL length (dashed), (middle-bottom) 
R, and (bottom) Ising energy (crosses) and Ising energy per pixel (dashed; multiplied by 1000 
for display) for NOAA AR 10377. 



polarity separation line (PSL) is presented. Since the PSL in this AR is only 
a few megameters (or pixels) long (c/. middle-top panel), this angle cannot be 
measured in a reliable way. Indeed, a small growth in the PSL detection in any 
direction can cause the angle to change dramatically. In the middle-bottom panel 
the total flux near the PSL [R] is very small until it begins to increase as a false 
PSL is detected due to the near-horizontal fields of the large leading polarity 
sunspot approaching the west limb on 12 June. 

Ising energy, a proxy for magnetic connectivity, is shown in the bottom panel. 
This property increases during the main magnetic emergence phase (« 8 to 10 
June 2003) since it is dependent on the magnetic-field strength and inversely 
dependent on the distance between individual magnetic elements. The Ising 
energy per pixel (dashed line) appears to be very susceptible to geometrical 
effects as the large decrease near the west limb and increase near the east limb 
both coincide with the formation of false PSLs in the leading sunspot. It should 
be noted that this quantity was calcul ated without remapping the data to disk- 
centre as done by I Ahmed et al. ( 2010t ). giving the measurement an even larger 



viewing-angle dependence. 
4.3.2. NOAA 10365 

Active region NOAA 10365 rotates onto the visible solar disk on 19 May 2003 
at heliographic latitude -5°. At this point 10365 is mature and decaying, having 
emerged and evolved on the far side of the Sun. On 24 May, a new bipolar 
structure rapidly emerges in the extended plage of the trailing (positive) polarity. 
NOAA switches the 10365 designation to this newly emerged bipole several days 
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Figure 12. A comparison of detection contours for NOAA AR 10365. ASAP sunspots are 
represented by black crosses. The contours represent SMART in black (with NOAA 10365 
outlined in red) for the magnetic features, SPoCA in dashed blue for coronal features, and 
STARA in orange for sunspot penumbrae and magenta for umbrae. 



later. As the bipole evolves it develops a strong double PSL by merging with the 
decayed flux. It produces many C- and M-class flares and several X-class flares. 
The AR progresses around the visible disk, eventually returning as NOAA 10386. 
The onset of decay occurs as C- and M-class flares are produced with decreasing 
frequency and the spot areas, magnetic flux, and field strengths decrease. 

Figure [T2] shows a comparison of the heliographic positions and sizes of two 
sets of SMART, ASAP, STARA and SPoCA detections of NOAA 10365. We can 
see that positions of the SMART, ASAP, STARA and SPoCA detections agree 
well. The SPoCA detection, however, includes coronal loops extending away from 
the footpoint boundary of NOAA 10365. Before 24 May, SPoCA merges NOAA 
region 10367 with its detection of 10365. From 24 May -27 May it only detects 
10365, on 27 May at 1300 UT there is a single data point where these regions are 
merged by SPoCA, and from 29 May at 0100 UT onwards, SPoCA merges them 
for the remaining observation period. The longitudes of all detections within 24 - 
27 May agree well. After 27 May the SPoCA longitude drifts, reflecting changes 
in the merged coronal structures. Unlike 10377, the magnetic centroid of 10365 
at first trails behind the sunspot centroid but then precedes it, as evidenced by 
the top panel in Figure [T3"l This is because the new bipole, which develops many 
spots, emerges behind the existing weakly spotted bipole. The new emergence 
is clear in the plot of total sunspot area (middle panel), and is unclear in the 
magnetic and EUV area plots since the new bipole emerges partially within the 
boundary of the old one. Note that all areas for NOAA 10365 are much larger 
than those for simpler region 10377. For SMART, area is very sensitive to weak 
magnetic plage, however. This can be seen in the sudden jumps around May 25 
and 28, which are due to nearby plage temporarily merging with the AR. The 
jump in STARA area on 27 May can be attributed to a bad data file (note that 
there is no ASAP data point at that time). 

From 25 May onwards, the total magnetic flux increases gradually to over 
four-fold the initial value during development and levels off around 29 May (sec 
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Figure 13. Time series of position, area, and sunspot information characterising the evolu- 
tion of NOAA AR 10365. The legend indicates symbols and colors for each of the detection 
algorithms. The axes of the area plot are split between left (SPoCA and SMART) and right 
(ASAP and STARA). The SPoCA areas have been divided by three for display. 
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Figure 14. Time series of (top) total magnetic flux, total EUV intensity, (bottom) maximum 
magnetic field, and maximum EUV intensity for NOAA AR 10365. The axes of the plots are 
split between left (magnetic-field properties, black crosses) and right (coronal properties, blue 
squares). RHESSI flares associated with the AR are indicated by downward arrows. 
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Figure 15. Time series showing proxies for the complexity and polarity mixing in NOAA 
AR 10365. (middle-top) bipole separation line length (crosses) and PSL length (dashed), 
(middle-bottom) R, and (bottom) Ising energy (crosses) and Ising energy per pixel (dashed; 
multiplied by 1000 for display) 

Figure [T4"), . The maximum magnetic field increases abruptly on 25 May and also 
increases over time, albeit less smoothly than the magnetic flux. The maximum 
magnetic field did not show an overall increasing or decreasing trend in the case 
of simpler NOAA region 1 0377. 

The time series of SPoCA maximum intensity exhibits some peaks, which can 
be related to the following flares produced by NOAA f0365: the Ml. 9 flare at 
0534 UT on 26 May, the M1.6 flare at 0506 UT on 27 May, the X1.2 flare at 
005f UT on 29 May, and the M9.3 flare at 0213 UT on 31 May. The last two 
flares arc even visible in the total SPoCA intensity, which shows more or less a 
gradual increase over time, but less smooth than in the case of the simpler NOAA 
region 10377. The flares not picked up by SPoCA likely occurred in between EIT 
images. 

Signatures in the evolution of the magnetic topology of NOAA 10365 precede 
its intense coronal activity, indicated by the associated RHESSI flares in Figure 
IT5l Just before 25 May 2003 the new emergence causes a jump in the main bipole 
separation line length. As the emergence continues and strong PSLs develop, this 
length decreases, while the total PSL length increases, as shown in the middle- 
top panel of Figure [TS] Also, there are signs of gradual helicity injection as the 
angle between the main bipole connection line and the main PSL grows from 
near perpendicular (90°) to around 120° (top panel). The flux near PSL [R] 
grows during this time, as does the Ising energy (middle-bottom and bottom 
panels, respectively). A bump in R just before 26 May is followed by an intense 
RHESSI flare. Intense flaring begins again around the second bump in R on 28 
June. Examining the development of Ising energy, it appears that sharp increases 
in the property arc followed by the most intense flaring. 
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Figure 16. Time series of position, area, and sunspot information characterising the decay 
phase of NOAA AR 10365 (renamed 10386) during its second disk passage. The legend indicates 
symbols and colors for each of the detection algorithms. The axes of the area plot are split 
between left (SPoCA and SMART) and right (ASAP and STARA). The SPoCA areas have been 
divided by three for display. 

NOAA 10365 returns for a second disk passage, renamed 10386. We are able to 
observe its decay phase, as shown in Figures |T5] [T5] As no RHESSI data on flares 
is available for this period, no flare arrows were added to these figures. While 
the longitude of the SMART magnetic centroid increases linearly with time, 
the ASAP and STARA sunspot ccntroids show small departures from this line 
between 19 and 21 June, preceding the magnetic centroid. The SPoCA detection 
of NOAA 10386 merges with 10388 and 10389, so a direct comparison with the 
other algorithms cannot be made. 

The magnetic area does not change significantly, but the total sunspot area 
clearly decreases (middle panel, Figure [TH]), and has already decreased substan- 
tially since the previous disk passage (as NOAA 10365). The total magnetic 
flux decreases (top panel, Figure IT7|) as its magnetic fields diffuse and weaken. 
Comparing the values to FigureHH we notice that the flux had already decreased 
significantly since the previous solar rotation. The total EUV intensity does not 
change substantially, regardless of the weakening magnetic footpoints, although 
it has decreased since the previous solar rotation. Its increase on 22 June is 
due to the detection merging with a large region near the limb. The maximum 
magnetic-field value shows a gradual, although not very smooth decrease, and 
has also decreased since the previous passage. The maximum EUV intensity docs 
not show a clear trend, and although several jumps are detected, the intensity 
levels are much less than those associated with flares over the previous passage. 
The peak on 18 June at 0100 UT, for instance, can likely be associated to the 
M6.8 flare produced by region 10386 at 2227 UT on 17 June. The PSL length has 
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Figure 17. Time series of (top) total magnetic flux, total EUV intensity, (bottom) maximum 
magnetic field, and maximum EUV intensity for NOAA AR 10365 on its second disk passage 
as 10386. The axes of the plots are split between left (magnetic-field properties, black crosses) 
and right (coronal properties, blue squares). 
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Figure 18. Time series showing proxies for the complexity and polarity mixing in NOAA 
10386. (top) PSL orientation with respect to the bipole separation line, (middle-top) bipolc 
separation line length (crosses) and PSL length (dashed), (middle-bottom) R, and (bottom) 
Ising energy (crosses) and Ising energy per pixel (dashed; multiplied by 1000 for display) 
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decreased since the previous solar rotation, and shows a further gradual decrease 
in Figure 1181 The same is true for both R and the Ising energy. 

5. Discussion and Conclusion 

In this article the performance of the presented algorithms is tested in several 
ways: the overall detection performances are investigated as compared to es- 
tablished methods; the correlations between extracted physical properties are 
determined using Principal Component Analysis; the stability and usefulness 
of the algorithms and of various feature properties for studying feature time- 
evolution is tested. Additionally, some lessons were learned about data analysis 
using automated detection algorithms. The following relates the main results of 
these investigations. 

In Section IO we compared ASAP, STARA, SIDC, and NOAA sunspot detec- 
tions and found out that although different numbers of features are detected, 
the total disk areas agree well. The same is true for the comparison between 
SMART and SPoCA active-region detections. The difference in numbers can 
be explained as follows: ASAP is sensitive to pores while STARA only detects 
developed sunspots. Pores clearly do not add significant area to the total disk 
coverage, while there is a large number of them detected. NOAA sunspot counts 
include both penumbrae and umbral cores, whereas ASAP and STARA only count 
sunspot groups. Consequently, sunspot counts should be very carefully assessed 
when applied to a long-term inhomogencous historical record. NOAA only counts 
those active regions that possess sunspots, which is not the case for SMART 
nor SPoCA. SMART detects both small AR fragments as well as complexes of 
bipolar structures, while SPoCA is sensitive to bright loop connections between 
adjacent features, often merging them into a single detection. These differences 
are partially dependent on thresholds chosen by the developers and may be tuned 
to return an agreeable feature boundary. Boundaries are inherently arbitrary as 
there is no established definition of each feature. Additionally, features evolve 
continuously and are prone to merging and fragmentation, so universally defining 
a feature is very difficult. The summed area of features is a preferred quantity 
as it is both more stable and less ambiguous. 

The use of Principal Component Analysis (Section I4.2[) has allowed us to 
determine which feature properties contain the most information about our data 
set. PCA also tends to separate photospheric and coronal contribution. Features 
computed at the photospheric level such as R, Lsg, Ising energy, ASAP sunspot 
area, and magnetic flux are highly correlated to each other and have a large 
contribution to the first component, which accounts for 52.86% of the variability 
in the total data set. The maximum, variance, skewness, and kurtosis in EUV 
intensity images are highly correlated to each other and moderately correlated to 
both the first and second principal components. By reducing dime nsiona l ity th e 
accuracy and robustness of a classification scheme can be enhanced (|jianeLl20lih . 
For example, this could be used to discriminate between flaring and non-flaring 
active regions properties. 

Through the time series analysis of two AR case studies (Section I4.3[) . we 
have observed three physical processes evident in their evolution: emergence of 
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a bipolar magnetic structure, sunspots, and EUV loops (Section l4.3.ip ; increase 
and peak in non-potentiality, followed by the onset of flaring (Section I4.3.2[) ; 
decay and weakening of magnetic footpoints ( Section I4.3.2[) . We find that the 
algorithms show good correspondence between centroid positions and areas but 
significant divergence is seen in other properties. 

In the case study of the simple active region NOAA 10377 (Section 14.3.1 [) 
we see that the total number of detected sunspots fluctuates wildly as transient 
spots rapidly emerge and disappear. This is partly due to the visibility curve 
since the area of small spots i s highly impacted by the o bserver's viewing angle 
(jDalla. Fletcher, and Waltonl . 120081: IWatson et aU , \200$) . In order to study AR 



evolution, sunspot area is more indicative of the emergence and decay of an AR 
than coronal or magnetic area, which do not necessarily decrease during decay 
(see Section |4"3)) . As a basis for long-term AR tracking, magnetic flux is more 
useful than the sunspot area (since sunspots arc much more transient than their 
magnetic footprints) or maximum magnetic field value (sin ce it is affected by the 



MDI saturation problem iLiu. Norton, and Scherrerl ([2007J)). Also, the maximum 



magnetic-field value is unstable since different positions in the active region will 
over-take each other in field magnitude as they develop, causing the location 
that the value is sampled from to vary wildly. Finally, the total EUV intensity 
determined by SPoCA has a smooth behaviour over time and is closely linked to 
area. The maximum EUV intensity peaks when the active region emits a large 
flare, and appears to be a useful indicator of eruptions in the corona. 

The case study of complex, flaring NOAA 10365 (Section I4.3.2|) shows that 
flaring can happen both in periods of flux emergence as well as non-potentiality 
enhancement in an active region. Following the initial flaring during the emer- 
gence phase of evolution, further flaring occurs as the main PSL rotates with 
respect to the bipole connection line. This is a sign of helicity injection and is 
coincident with increases of other properties related to polarity mixing. Helicity 
injection has been established as a method of increasing non-potentiality and 
may be caused b y the emergence of subsurface twisted flux ropes, as seen in 
Dun et all (|2007l ). 



As NOAA 10365 returns after one solar rotation, decay is seen in the strength 
of its magnetic footprint. However, the area is not seen to decrease significantly, 
since supergranular diffusion causes a radial dispersal of magnetic elements. 
Coronal structure s do not appear to decay readily, either. This result agrees with 



Lites et all (|I995[ ). where it is reasoned that if the coronal magnetic structure 
is closed, it may be in a state of quasi-static equilibrium, whereby the magnetic 
buoyancy of the loops is cancelled by the weight of plasma trapped at the bottom 
of the closed structure. 

By performing these studies, it is found that magnetic active-region detections 
provide the most stable base for feature tracking. Sunspots are only visible for 
short periods of time, and coronal detections continually form bright loop connec- 
tions to nearby features. The simple feature tracking method used in this article 
(see Section l3~5j) is novel in that it allows features to be tracked between multiple 
disk passages. This is essential for analysing the complete life-cycle of an active 
region, as exemplified in the analysis of NOAA 10365 (Scction l4.3.2p . Future work 
on active-region evolution should combine morphological information to better 
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handle merging and splitting (as done by Welsch and Longcopel ( 20031 )) with our 
method of multiple disk passage tracking. Future work on our algorithms will 
also address the automatic detection and handling of structural or visible errors 
in solar data, to avoid discontinuities in the time series due to a corrupted image, 
as was seen in the STARA outputs used in the case studies (see Section f4. 3. II) . 

This work will be expanded in the future to include an analysis of the full 
SOHO archive as well as detailed studies of photospheric and coronal SDO data 
sets. Many physical studies will benefit from this work, as inv estigations that 
examine coronal he ating as a result of large scale magnetic fields dSchriiver L 1987 1 



Fisher et al. , 1998 ) , coupling between the p hotosphere and coronafHandv and Schriiverl .[ 



2001 ). sources of coronal mass ejections dSubramanian and Derel. 2001), flux 
emer gence and distribution ( Liu and Kurokawall2004l:lAbramenko and Longcopel .[ 
2005 ) and flare forecasting ( Gallagher. Moon, and Wand . l2002h can all be re- 



peated with these automated detection methods. Using these methods allows a 
far greater number of features to be analysed and reduces human bias in the 
detection of features in the solar data. 

The algorithms presented here are automated (once thresholds have been 
fixed), independent, and unsupervised. Although some development remains to 
be done, they detect features the way that they are intended, and will provide 
useful additions to the SDO pipeline feature-detection methods. However, this 
work shows that automated methods cannot replace human data analysis but 
they can help to stream-line the process. 
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